1 Olin’s Natural Gas Consumption

In this activity, we’ll practice using visualizations to explore relationships between multiple variables. We’ll examine Olin’s natural gas consumption data (July 2012 - November 2022) and resulting bills (January 2017 - November 2022). Here is some information about the data:

We’ll start by loading the data:

gas_usage<-read_csv("http://faculty.olin.edu/dshuman/DS/olin_natural_gas_usage.csv",col_type=list(building = readr::col_factor()))

Here is the monthly natural gas consumption history by building:


Exercise 1.1 (Units of observation) What does each row correspond to in the data table?

Each row corresponds to a different building’s therm consumption for the year and month.


5 Heat Maps to Visualize Multiple Variables

There are multiple ways in R to generate heat maps to examine multiple variables. We’ll examine a few here.

5.1 geom_tile()

We can use the geom_tile() layer in the ggplot2 package to make a heat map, as follows:

campus_gas_costs2<-filter(campus_gas_costs,year>2012,year<2022)
ggplot(campus_gas_costs2,aes(x=month,y=year,fill=total_therms))+
  geom_tile()+
  ggtitle("Olin's Total Monthly Natural Gas Consumption")


And there are many color scheme options via scale_fill_gradient, scale_fill_brewer, scale_fill_distiller, etc.:

ggplot(campus_gas_costs2,aes(x=month,y=year,fill=total_therms))+
  geom_tile()+
  ggtitle("Olin's Total Monthly Natural Gas Consumption")+
  scale_fill_gradient(low="white", high="blue") 

5.2 heatmap.2

The function heatmap.2 in the gplots package makes similar plots, but the starting table structure should be a data frame instead of the Tidyverse’s tibble and it should already look like the matrix you want to show:

campus_gas_costs3
## # A tibble: 5 × 13
##    year   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  2017 29987 30244 36678 20129 16397 14366 15456  4640 18283 20923 30530 43560
## 2  2018 45839 39610 41359 37562  5679  1214  2176  1555  4810 25654 33531 37935
## 3  2019 42381 41929 36875 22373 21404 11718 10291 12283 16634 23794 34159 39948
## 4  2020 40995 44838 31633 23363  7611  3970   725   820 22041 27299 39442 56493
## 5  2021 29907 56962 38346 30617 19831  1979  1505  1393  9763 25353 33828 38811
campus_gas_costs3<-as.data.frame(campus_gas_costs3) # convert from tibble to data frame
row.names(campus_gas_costs3)<-campus_gas_costs3$year
campus_gas_costs3<-campus_gas_costs3[,2:13]
campus_gas_mat <- data.matrix(campus_gas_costs3)
heatmap.2(campus_gas_mat, Rowv=NA, Colv=NA, scale="column",
          col=heat.colors(256),margins=c(10,20),
          colsep=c(1:12),rowsep=(1:9), sepwidth=c(0.05,0.05),
          sepcolor="white",cexRow=3,cexCol=3,trace="none",
          dendrogram="none")

And we can also change the color scheme:

heatmap.2(campus_gas_mat, Rowv=NA, Colv=NA, scale="column",
          col="bluered",margins=c(10,20),
          colsep=c(1:12),rowsep=(1:9), sepwidth=c(0.05,0.05),
          sepcolor="white",cexRow=3,cexCol=3,trace="none",
          dendrogram="none")

Heat map with row clusters

It can be tough to identify interesting patterns by visually comparing across rows and columns. Including dendrograms helps to identify interesting clusters.

heatmap.2(campus_gas_mat, Colv=NA, scale="column",
          col="bluered",margins=c(10,20),
          colsep=c(1:12),rowsep=(1:9), sepwidth=c(0.05,0.05),
          sepcolor="white",cexRow=3,cexCol=3,trace="none",
          dendrogram="row")

Heat map with column clusters

We can also construct a heat map which identifies interesting clusters of columns (variables).

heatmap.2(campus_gas_mat, Rowv=NA,  scale="column",
          col="bluered",margins=c(10,20),
          colsep=c(1:12),rowsep=(1:9), sepwidth=c(0.05,0.05),
          sepcolor="white",cexRow=3,cexCol=3,trace="none",
          dendrogram="column")

5.3 Other Options

  • The package heatmap3 makes heat maps in a similar way to the heatmap.2 function above

  • You can also look into star plots via the stars function


6 Explore

Exercise 6.1 (Additional exploration)

  1. Write down one additional research question you think can be answered with this data.
  2. Make a visualization to explore that question.
  1. One potential research question to explore with this data set is how therm costs correlate to the buildings, and whether or not there is a connection to the cost based off building size and other factors.
# campus_gas_costs2<-filter(campus_gas_costs,year>2012,year<2022)
ggplot(gas_costs_rounded,aes(x=month,y=building,fill=total_cost))+
  geom_tile()+
  ggtitle("Olin's Total Therm Cost by month and building")